23 research outputs found

    Scalable methods to analyze Semantic Web data

    Get PDF
    Semantic Web data is currently being heavily used as a data representation format in scientific communities, social networks, business companies, news portals and other domains. The irruption and availability of Semantic Web data is demanding new methods and tools to efficiently analyze such data and take advantage of the underlying semantics. Although there exist some applications that make use of Semantic Web data, advanced analytical tools are still lacking, preventing the user from exploiting the attached semantics. The main objective of this dissertation is to provide a formal framework that enables the multidimensional analysis of Semantic Web data in a scalable and efficient manner. The success of multidimensional analysis techniques applied to large volumes of structured data in the context of business intelligence, especially for data warehousing and OLAP applications, has prompted us to investigate the application of such techniques to Semantic Web data, whose nature is semi-structured and contain implicit knowledge

    Statistically-driven generation of multidimensional analytical schemas from linked data

    Get PDF
    The ever-increasing Linked Data (LD) initiative has given place to open, large amounts of semi-structured and rich data published on the Web. However, effective analytical tools that aid the user in his/her analysis and go beyond browsing and querying are still lacking. To address this issue, we propose the automatic generation of multidimensional analytical stars (MDAS). The success of the multidimensional (MD) model for data analysis has been in great part due to its simplicity. Therefore, in this paper we aim at automatically discovering MD conceptual patterns that summarize LD. These patterns resemble the MD star schema typical of relational data warehousing. The underlying foundations of our method is a statistical framework that takes into account both concept and instance data. We present an implementation that makes use of the statistical framework to generate the MDAS. We have performed several experiments that assess and validate the statistical approach with two well-known and large LD sets.This research has been partially funded by the “Ministerio de Economía y Competitividad” with contract number TIN2014-55335-R. Victoria Nebot was supported by the UJI Postdoctoral Fel- lowship program with reference PI14490

    Exploiting semantic annotations for open information extraction: an experience in the biomedical domain

    Get PDF
    The increasing amount of unstructured text published on the Web is demanding new tools and methods to automatically process and extract relevant information. Traditional information extraction has focused on harvesting domain-specific, pre-specified relations, which usually requires manual labor and heavy machinery; especially in the biomedical domain, the main efforts have been directed toward the recognition of well-defined entities such as genes or proteins, which constitutes the basis for extracting the relationships between the recognized entities. The intrinsic features and scale of the Web demand new approaches able to cope with the diversity of documents, where the number of relations is unbounded and not known in advance. This paper presents a scalable method for the extraction of domain-independent relations from text that exploits the knowledge in the semantic annotations. The method is not geared to any specific domain (e.g., protein–protein interactions and drug–drug interactions) and does not require any manual input or deep processing. Moreover, the method uses the extracted relations to compute groups of abstract semantic relations characterized by their signature types and synonymous relation strings. This constitutes a valuable source of knowledge when constructing formal knowledge bases, as we enable seamless integration of the extracted relations with the available knowledge resources through the process of semantic annotation. The proposed approach has successfully been applied to a large text collection in the biomedical domain and the results are very encouraging.The work was supported by the CICYT project TIN2011-24147 from the Spanish Ministry of Economy and Competitiveness (MINECO)

    Building Data Warehouses with Semantic Web Data

    Get PDF
    The Semantic Web (SW) deployment is now a realization and the amount of semantic annotations is ever increasing thanks to several initiatives that promote a change in the current Web towards the Web of Data, where the semantics of data become explicit through data representation formats and standards such as RDF/(S) and OWL. However, such initiatives have not yet been accompanied by e cient intelligent applications that can exploit the implicit semantics and thus, provide more insightful analysis. In this paper, we provide the means for e ciently analyzing and exploring large amounts of semantic data by combining the inference power from the annotation semantics with the analysis capabilities provided by OLAP-style aggregations, navigation, and reporting. We formally present how semantic data should be organized in a well-de ned conceptual MD schema, so that sophisticated queries can be expressed and evaluated. Our proposal has been evaluated over a real biomedical scenario, which demonstrates the scalability and applicability of the proposed approach

    Tailored semantic annotation for semantic search

    Get PDF
    This paper presents a novel method for semantic annotation and search of a target corpus using several knowledge resources (KRs). This method relies on a formal statistical framework in which KR concepts and corpus documents are homogeneously represented using statistical language models. Under this framework, we can perform all the necessary operations for an efficient and effective semantic annotation of the corpus. Firstly, we propose a coarse tailoring of the KRs w.r.t the target corpus with the main goal of reducing the ambiguity of the annotations and their computational overhead. Then, we propose the generation of concept profiles, which allow measuring the semantic overlap of the KRs as well as performing a finer tailoring of them. Finally, we propose how to semantically represent documents and queries in terms of the KRs concepts and the statistical framework to perform semantic search. Experiments have been carried out with a corpus about web resources which includes several Life Sciences catalogs and Wikipedia pages related to web resources in general (e.g., databases, tools, services, etc.). Results demonstrate that the proposed method is more effective and efficient than state-of-the-art methods relying on either context-free annotation or keyword-based search.We thank anonymous reviewers for their very useful comments and suggestions. The work was supported by the CICYT project TIN2011-24147 from the Spanish Ministry of Economy and Competitiveness (MINECO)

    Fonaments d’enginyeria del programari

    Get PDF
    Departament de Llenguatges i Sistemes Informàtics. Codi d’assignatura: EI102

    Interaction Mining: Making Business Sense of Customers Conversations through Semantic and Pragmatic Analysis

    Get PDF
    Via the Web a wealth of information for business research is ready at our fingertips. Analyzing this – unstructured - information, however, can be very difficult. Analytics has become the business buzzword distinguishing traditional competitors from ‘analytics competitors’ who have dramatically boosted their revenues. The latter competitors distinguish themselves through “expert use of statistics and modeling to improve a wide variety of functions” (Davenport, 2006, p. 105). However, not all information lends itself to statistics and models. Actually, most information on the Web is made for, and by, people communicating through ‘rich’ language. This richness of our language is typically missed or not adequately accounted for in (statistical) analytics (e.g. Text-mining) - and so is its real meaning - because it is hidden in semantics rather than form (e.g. syntax). In our efforts of turning unstructured data into structured data, important information – and our ability to distinguish ourselves from competitors - gets lost

    Análisis e interpretación de datos estadísticos en 4º de ESO mediante la herramienta R

    No full text
    Treball Final de Màster Universitari en Professor/a d'Educació Secundària Obligatòria i Batxillerat, Formació Professional i Ensenyaments d'Idiomes. Codi: SAP509. Curs: 2017/2018.El Trabajo Final de Máster que se presenta a continuación pertenece a la modalidad de mejora educativa y está enmarcado dentro del Máster en Profesorado de Educación Secundaria Obligatoria y Bachillerato, Formación Profesional y Enseñanza de Idiomas, en la especialidad de matemáticas. En la sociedad actual vivimos rodeados de datos. Tanto empresas como organismos públicos analizan diariamente datos, los interpretan y los resumen en tablas o gráficos. También podemos encontrar datos expresados en forma de estadísticas relativos a acontecimientos políticos, sociales o económicos en nuestro día a día, especialmente en los medios de comunicación. Todos estos datos no son sólo números, sino números en un contexto que nos ayudan a explicar nuestro universo. Stephen Few, un reconocido analista de datos, dice que “ los números tienen una importante historia que contarte. Ellos confían en ti para darles una voz clara y convincente ”. Por tanto, el análisis e interpretación de datos constituye una competencia fundamental para el alumno del siglo XXI. La estadística es la c iencia dentro de las matemáticas que utiliza conjuntos de datos numéricos para obtener, a partir de ellos, inferencias. Por tanto, se trata de la disciplina idónea para que el alumno trabaje y adquiera la competencia del análisis e interpretación de datos. En este trabajo se describe la experiencia de la aplicación de una mejora educativa al tema de estadística sobre alumnos de 4º de ESO cuyo objetivo principal es que los alumnos aprendan a analizar e interpretar datos y que asimilen la importancia de la estadística, fomentando el autoaprendizaje y el desarrollo del razonamiento crítico. Para ello, se propone la realización de un proyecto de análisis estadístico por equipos utilizando la herramienta estadística R. Objetivos secundarios de dicha propuesta de mejora son que los alumnos valoren la utilidad de las herramientas de software estadístico para el análisis de datos y promover el aprendizaje activo en equipos colaborativos que fomenten la construcción del conocimiento y la consecución de un aprendizaje significativo. Los resultados obtenidos tras la aplicación de la mejora educativa revelan que dicha propuesta favorece la motivación del alumno y su implicación en el aprendizaje. Los alumnos valoran la experiencia como satisfactoria y opinan que les ha ayudado en su aprendizaje, ya que han mejorado la comprensión de los conceptos estadísticos, así como su capacidad de reflexión y razonamiento

    Scalable methods to analyze Semantic Web data

    Get PDF
    Semantic Web data is currently being heavily used as a data representation format in scientific communities, social networks, business companies, news portals and other domains. The irruption and availability of Semantic Web data is demanding new methods and tools to efficiently analyze such data and take advantage of the underlying semantics. Although there exist some applications that make use of Semantic Web data, advanced analytical tools are still lacking, preventing the user from exploiting the attached semantics.En la actualidad, tanto entre las comunidades científicas como en las empresas, así como en las redes sociales y otros dominios web, se emplean cada vez más datos anotados semánticamente, los cuales contribuyen al desarrollo de la Web Semántica. Dicho crecimiento de este tipo de datos requiere la creación de nuevos métodos y herramientas capaces de aprovechar la semántica subyacente para analizar los datos de forma eficiente. Aunque ya existen aplicaciones capaces de usar y gestionar datos anotados semánticamente, éstas no explotan la semántica para realizar análisis sofisticados
    corecore